Skip to content

Conversation

@huleilei
Copy link
Contributor

Changes Made

Migrate the multi-column sort comparator construction path from Arrow2 to the Arrow-rs-based export, focusing specifically on build_multi_array_bicompare in src/daft-core/src/array/ops/sort.rs.

  • Refactor build_multi_array_bicompare to remove direct reliance on Series::to_arrow2().

  • For each (left[i], right[i]) series pair:

    • Use the Arrow-rs export path Series::to_arrow() to obtain canonical arrow::array::ArrayRef representations.

    • Bridge these Arrow-rs arrays into the legacy Arrow2 daft_arrow::array::Array representation via the existing From<&dyn arrow_array::Array> for Box<dyn Array> conversion (behind the arrow feature).

    • Pass the bridged Arrow2 arrays into kernels::search_sorted::build_nulls_first_compare_with_nulls to construct per-column DynComparators that respect both descending (reversed sort direction) and nulls_first (null ordering) semantics.

  • Keep the combined multi-column comparator logic unchanged: the returned comparator still iterates the per-column comparators and returns on the first non-Equal ordering.

Related Issues

Part of Daft’s staged Arrow2 → Arrow-rs migration (#5741).

@huleilei huleilei marked this pull request as draft January 25, 2026 11:23
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 25, 2026

Greptile Overview

Greptile Summary

Migrated the multi-column sort comparator construction from Arrow2 to Arrow-rs in build_multi_array_bicompare. The function now calls Series::to_arrow() to obtain Arrow-rs arrays, bridges them to Arrow2 via the From<&dyn arrow_array::Array> trait, then passes them to the existing build_nulls_first_compare_with_nulls comparator builder.

Key changes:

  • Replaced direct to_arrow2() calls with to_arrow() followed by explicit Arrow-rs → Arrow2 conversion
  • Added intermediate l_arrow and r_arrow variables to hold Arrow-rs ArrayRef representations
  • Used Box::from(array.as_ref() as &dyn Array) to bridge Arrow-rs arrays into Arrow2 format
  • Preserved existing multi-column comparator logic that respects descending and nulls_first semantics
  • Added #[allow(deprecated)] attributes for Arrow2 migration transitional code

This is part of the staged Arrow2 → Arrow-rs migration effort and maintains backward compatibility while progressively modernizing the array backend.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The refactoring is a straightforward migration from Arrow2 to Arrow-rs using well-established conversion paths. The change replaces to_arrow2() with to_arrow() followed by the existing From<&dyn arrow_array::Array> for Box<dyn Array> conversion trait. The core comparator logic remains unchanged, preserving existing sort behavior and null handling semantics. The conversion path through to_data() and from_data() is a standard Arrow interop mechanism that preserves all array properties.
  • No files require special attention

Important Files Changed

Filename Overview
src/daft-core/src/array/ops/sort.rs Refactored build_multi_array_bicompare to use Arrow-rs export path (to_arrow()) with bridge conversion to Arrow2, maintaining existing comparator logic and null handling semantics.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant build_multi_array_bicompare
    participant Series
    participant ArrowRS as Arrow-rs Array
    participant Converter as From Trait
    participant Arrow2 as Arrow2 Array
    participant Comparator as build_nulls_first_compare_with_nulls

    Caller->>build_multi_array_bicompare: left, right, descending, nulls_first
    loop For each (left[i], right[i]) pair
        build_multi_array_bicompare->>Series: l.to_arrow()?
        Series-->>build_multi_array_bicompare: l_arrow: ArrayRef (Arrow-rs)
        build_multi_array_bicompare->>Series: r.to_arrow()?
        Series-->>build_multi_array_bicompare: r_arrow: ArrayRef (Arrow-rs)
        
        build_multi_array_bicompare->>Converter: Box::from(l_arrow.as_ref() as &dyn Array)
        Converter->>ArrowRS: to_data()
        ArrowRS-->>Converter: ArrayData
        Converter->>Arrow2: from_data()
        Arrow2-->>build_multi_array_bicompare: l_arrow2: Box<dyn daft_arrow::array::Array>
        
        build_multi_array_bicompare->>Converter: Box::from(r_arrow.as_ref() as &dyn Array)
        Converter->>ArrowRS: to_data()
        ArrowRS-->>Converter: ArrayData
        Converter->>Arrow2: from_data()
        Arrow2-->>build_multi_array_bicompare: r_arrow2: Box<dyn daft_arrow::array::Array>
        
        build_multi_array_bicompare->>Comparator: build_nulls_first_compare_with_nulls(l_arrow2, r_arrow2, desc, nf)
        Comparator-->>build_multi_array_bicompare: DynComparator
        build_multi_array_bicompare->>build_multi_array_bicompare: cmp_list.push(comparator)
    end
    
    build_multi_array_bicompare->>build_multi_array_bicompare: Create combined_comparator closure
    Note over build_multi_array_bicompare: Iterates cmp_list, returns first non-Equal ordering
    build_multi_array_bicompare-->>Caller: combined_comparator: DynComparator
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@codecov
Copy link

codecov bot commented Jan 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.90%. Comparing base (538c593) to head (8c28cc0).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6087      +/-   ##
==========================================
- Coverage   72.91%   72.90%   -0.02%     
==========================================
  Files         973      973              
  Lines      126184   126200      +16     
==========================================
- Hits        92011    92009       -2     
- Misses      34173    34191      +18     
Files with missing lines Coverage Δ
src/daft-core/src/array/ops/sort.rs 76.55% <100.00%> (+0.19%) ⬆️

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@huleilei huleilei marked this pull request as ready for review January 25, 2026 12:41
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@huleilei
Copy link
Contributor Author

@universalmind303 help me review when you are convenient. Thanks

Copy link
Member

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is exactly what we're looking for. This pr just replaces deprecated calls with other deprecated calls. Instead we need to rewrite the entire sort kernels to be arrow-rs.

@huleilei
Copy link
Contributor Author

Thanks for the review and the clarification!

You're right – in its current form this PR is essentially just moving the call site from to_arrow2() to to_arrow() and then bridging back into Arrow2, and it still relies on the deprecated Arrow2-based comparator path with #[allow(deprecated)]. That doesn't really reduce the Arrow2 surface area or move the sort kernels themselves onto Arrow-rs.

My initial goal here was to take a very small step by removing the direct to_arrow2() usage from the caller side, but I agree this ends up being more of a cosmetic refactor than the kind of structural change you're looking for.

Given your feedback, I'm happy to:

  1. Close or repurpose this PR, and
  2. Work on a follow-up that actually rewrites the sort kernels (e.g. the logic behind build_nulls_first_compare_with_nulls / search_sorted and the sort comparators) to operate directly on Arrow-rs arrays and buffers, so that:
    • comparators and sort indices are implemented purely in Arrow-rs,
    • there is no Arrow-rs → Arrow2 bridging in the hot path, and
    • the deprecated Arrow2 kernels and conversion helpers can be removed instead of just being wrapped.

In the meantime, I'll mark this PR as draft so it doesn't block the proper Arrow-rs sort kernel rewrite.

Before I start that work, could you confirm if you have a preferred target for the first “fully Arrow-rs” sort kernel (e.g. search_sorted / multi-column comparators), or if you'd rather see a proposal/plan in an issue first? I'm happy to align with whatever migration plan you already have in mind for Arrow-rs sort.

@huleilei huleilei marked this pull request as draft January 27, 2026 14:42
@universalmind303
Copy link
Member

Before I start that work, could you confirm if you have a preferred target for the first “fully Arrow-rs” sort kernel (e.g. search_sorted / multi-column comparators), or if you'd rather see a proposal/plan in an issue first? I'm happy to align with whatever migration plan you already have in mind for Arrow-rs sort.

I don't think there's really a solid proposal or plan in place. What I'm really looking for in PR's is a fully self contained part of the codebase is moved over from arrow2 to arrow-rs. That could be a module, a single function, the entire kernel, or any other atomic unit of code.

@universalmind303
Copy link
Member

@huleilei, I added some context to the arrow2 migration discussion that is likely helpful for you

#5741 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants